A Hybrid Japanese Parser with Hand-crafted Grammar and Statistics

نویسندگان

  • Hiroshi Kanayama
  • Kentaro Torisawa
  • Yutaka Mitsuishi
  • Jun'ichi Tsujii
چکیده

This paper describes a hybrid parsing method for Japanese which uses both a hand-crafted grammar and a statistical technique. The key feature of our system is that in order to estimate likelihood for a parse tree, the system uses information taken from alternative partial parse trees generated by the grammar. This utilization of alternative trees enables us to construct a new statistical model called Triplet/Quadruplet Model. We show that this model can capture a certain tendency in Japanese syntactic structures and this point contributes to improvement of parsing accuracy on a shallow level. We report that, with an underspecified HPSG-based grammar and a maximum entropy estimation, our parser achieved high accuracy: 88.6% accuracy in dependency analysis of the EDR annotated corpus, and that it outperformed other purely statistical parsing methods on the same corpus. This result suggests that proper treatment of hand-crafted grammars can contribute to parsing accuracy on a shallow level.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using lexical statistics to improve HPSG parsing

In this thesis, we describe research aimed at discovering how lexical statistics could improve the performance of a parser that uses hand-crafted grammars in the HPSG framework. Initially, we explain relevant characteristics of the HPSG formalism and of the PET parser that is used for experimentation. The parser and grammar combination is situated in the broader parser space, and we discuss the...

متن کامل

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

Wide-Coverage Deep Statistical Parsing Using Automatic Dependency Structure Annotation

A number of researchers (Lin 1995; Carroll, Briscoe, and Sanfilippo 1998; Carroll et al. 2002; Clark and Hockenmaier 2002; King et al. 2003; Preiss 2003; Kaplan et al. 2004;Miyao and Tsujii 2004) have convincingly argued for the use of dependency (rather than CFG-tree) representations for parser evaluation. Preiss (2003) and Kaplan et al. (2004) conducted a number of experiments comparing “deep...

متن کامل

Combining Deep and Shallow Approaches in Parsing German

The paper describes two parsing schemes: a shallow approach based on machine learning and a cascaded finite-state parser with a hand-crafted grammar. It discusses several ways to combine them and presents evaluation results for the two individual approaches and their combination. An underspecification scheme for the output of the finite-state parser is introduced and shown to improve performance.

متن کامل

BANK OF ENGLISH AND BEYOND Hand-crafted parsers for functional annotation

The 200 million word corpus of the Bank of English was annotated morphologically and syntactically using the English Constraint Grammar analyser, a rulebased shallow parser developed at the Research Unit for Computational Linguistics, University of Helsinki. We discuss the annotation system and methods used in the corpus work, as well as the theoretical assumptions of the Constraint Grammar syn...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000